Skip to content

Tvlist feat new#14616

Merged
jt2594838 merged 56 commits intoapache:force_ci/split_chunkfrom
shizy818:tvlist-feat-new
Feb 7, 2025
Merged

Tvlist feat new#14616
jt2594838 merged 56 commits intoapache:force_ci/split_chunkfrom
shizy818:tvlist-feat-new

Conversation

@shizy818
Copy link
Contributor

@shizy818 shizy818 commented Jan 2, 2025

Description

Content1 ...

Content2 ...

Content3 ...


This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

* out of mempage bounds check
* overlapped data error during query
* change some list to array
* remember row count in tvlist iterator
@shizy818
Copy link
Contributor Author

Write Performance Test

# iotdb settings
series_slot_num=100
data_region_per_data_node=5

# benchmark settings
DEVICE_NUMBER=100
SENSOR_NUMBER=20
LOOP=10000
DATA_CLIENT_NUMBER=10
DATA_CLIENT_NUMBER=10
OPERATION_PROPORTION=1:0:0:0:0:0:0:0:0:0:0:0

Unaligned Series:

TVList Sort Threshold 0 100 200 500 1000 2000 5000 10000 master
points/s 4569824.325 4374989.505 4342791.53 4481212.23 4335337.15 4217159.025 4333778.515 4191477.875 4355698.645
Comparison 1.049159893 1.004428878 0.997036729 1.028815948 0.995325321 0.968193479 0.994967482 0.96229749

Aligned Series:

TVList Sort Threshold 0 100 200 500 1000 2000 5000 10000 master
points/s 6845973 6616038.8 7266610.87 7332207.695 6747288.985 6925542.685 7476763.485 7219287.525 7161518.61
Comparison 0.955938729 0.923831824 1.014674578 1.023834203 0.942158968 0.967049457 1.044019277 1.008066573

@shizy818
Copy link
Contributor Author

shizy818 commented Jan 17, 2025

Write/Query mixed mode

# iotdb settings
series_slot_num=100
data_region_per_data_node=5

# benchmark settings
DEVICE_NUMBER=100
SENSOR_NUMBER=20
LOOP=10000
DATA_CLIENT_NUMBER=10
DATA_CLIENT_NUMBER=8
IS_RECENT_QUERY=true
OPERATION_PROPORTION=1:0:1:1:1:1:1:1:0:1:1:1

Unaligned Series:

TVList Sort Threshold 0 100 200 500 1000 2000 5000 10000 master
INGESTION 1960317.5 1943008.65 2019711.86 2072910.49 2051436.53 2059098.48 1922076.65 1835176.4 1394176.65
PRECISE_POINT 0 0 0 0 0 0 0 0
TIME_RANGE 1480.28 1373.54 1399.18 1417.37 1461.29 1449.54 1394.38 1374.51 919.82
VALUE_RANGE 1483.19 1412.85 1414.09 1438.75 1470.86 1437.06 1421.54 1390.75 894.92
AGG_RANGE 78.6 77.91 80.98 83.11 82.25 82.56 77.07 73.58 55.9
AGG_VALUE 77.45 76.77 79.8 81.9 81.05 81.35 75.94 72.51 55.08
AGG_RANGE_VALUE 76.88 76.21 79.21 81.3 80.46 80.76 75.38 71.98 54.68
GROUP_BY 1032.57 1023.45 1063.85 1091.87 1080.56 1084.6 1012.42 966.65 734.36
LATEST_POINT 0 0 0 0 0 0 0 0 0
RANGE_QUERY_DESC 1471.21 1379.31 1387.84 1441.23 1454.83 1446.81 1396.66 1370.4 906.93
VALUE_RANGE_QUERY_DESC 1476.39 1392.44 1387.69 1425.76 1467.74 1438.82 1392.74 1378.52 906.6
GROUP_BY_DESC 1010.02 1001.1 1040.62 1068.03 1056.96 1060.91 990.31 945.54 718.32
Comparison 1.406086424 1.39366856 1.448685822 1.486844303 1.471433344 1.476932286 1.378647399 1.316321417

@shizy818
Copy link
Contributor Author

Aligned Series:

TVList Sort Threshold 0 100 200 500 1000 2000 5000 10000 master
INGESTION 1585694.11 1976424.22 2003598.88 2067901.94 2034466.94 2004544.07 1938070.54 1841351.19 1734845.89
PRECISE_POINT 0 0 0 0 0 0 0 0 0
TIME_RANGE 1228.43 1480.37 1493.53 1523.19 1508.56 1505.3 1483.8 1425.03 1163.35
VALUE_RANGE 1233.58 1505.41 1516.68 1555.96 1536.99 1534.92 1503.98 1440.16 1168.66
AGG_RANGE 63.58 79.25 80.34 82.91 81.57 80.37 77.71 73.83 69.56
AGG_VALUE 62.65 78.09 79.16 81.7 80.38 79.2 76.57 72.75 68.54
AGG_RANGE_VALUE 62.19 77.52 78.58 81.1 79.79 78.62 76.01 72.22 68.04
GROUP_BY 835.24 1041.05 1055.36 1089.23 1071.62 1055.86 1020.85 969.9 913.8
LATEST_POINT 0 0 0 0 0 0 0 0 0
RANGE_QUERY_DESC 1220.13 1484.94 1492.06 1521.5 1503.96 1517.33 1484.09 1416.81 1151.73
VALUE_RANGE_QUERY_DESC 1221.62 1481.65 1496.19 1542.3 1515.41 1516.17 1484.18 1420.98 1166.05
GROUP_BY_DESC 817 1018.31 1032.32 1065.45 1048.22 1032.8 998.55 948.72 893.85
Comparison 0.914023606 1.139240365 1.154914135 1.19197852 1.172702355 1.155451138 1.117133747 1.061386139

@shizy818
Copy link
Contributor Author

8f78b21

When I use minimal heap to merge sort, it performs much better when there are a number of sorted tvlist ( tvlist_sort_threshold = 100). However it performs worse when there is only one tvlist (tvlist_sort_threshold = 0).

Not sure if I should revert this change.

# Datatype: int
tvlist_sort_threshold=0

# When the average point number of timeseries in memtable exceeds this, the memtable is flushed to disk. The default threshold is 100000.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jt2594838 jt2594838 merged commit e8e449e into apache:force_ci/split_chunk Feb 7, 2025
30 of 32 checks passed
jt2594838 pushed a commit that referenced this pull request Feb 21, 2025
* Split non_aligned charge text chunk

* dev non_aligned

* dev aligned chunk split

* new type

* dev aligned binary chunk split

* Fix binary size calculatation

* fix IT

* update IoTDBDuplicateTimeIT.java

* fix pipe IT

* change method names

* add ut

* add UT

* remove useless methods

* fix UT

* fix /FileReaderManagerTest

* fix win UT

* add binary test

* Add Aligned UTs

* fix win ut

* improve coverage

* fix comments

* fix windows UT

* fix review

* fix review

* fix review

* target chunk size count non binary

* fix compile

* fix UT

* Tvlist feat new (#14616)

* null bitmap for int tvlist

* update min/max timestamp and sequential part of tvlist during insert

* mutable & immutable tvlists in writable memchunk

* copy-on-write array list

* review comments part 1

* fix unit test errors

* review comments part 2

* push down global time filter

* fix MemPageReaderTest case

* fix memory page offsets error

* synchronized sort & MergeSortTvListIterator bug

* tvlist_sort_threshold config property

* bug fix:
* out of mempage bounds check
* overlapped data error during query

* optimize TVListIterator & MergeSortTvListIterator

* retrofit encode when tvlist_sort_threshold is zero

* delay sort & statistic generation to query execution

* fix: skip deleted data during encode

* aligned time series part

* fix: MemAlignedChunkReader page offset

* performance issue:
* change some list to array
* remember row count in tvlist iterator

* fix: memory chunk reader may read more points than expected in one page

* update chunk & page statistic for aligend memchunk by column

* revert: getAlignedValueForQuery

* fix: * CopyOnWriteArrayList for AlignedTVList bitmaps
* memory control of column access

* refactor: Tim/Quick/Backward TVList

* refactor: synchronized tvlist method: sort, putXXX

* refactor: change list to array in AlignedTVList iterator

* revert: remove CopyOnWriteArrayList

* refactor: clone MergeSort iterator from ReadOnlyChunk

* fix: clone working tvlist during flush if there is query on it

* fix: writable mem chunk flush conditions

* refactor: add annotation and variable/function rename

* fix: * remove delete method in BinaryTVList
* filter deleted data in WritableMemChunk encode

* fix: remove getSortedTvListForQuery in SeriesRegionScan

* fix: TsFileProcessorTest unit test

* fix: IoTDBNullIdQueryIT.noMeasurementColumnsSelectTest

* fix: delete column of aligned time series

* fix: aligned timeseries encode bug

* fix: IoTDBGroupByNaturalMonthIT

* remove avgSeriesPointNumberThreshold setting

* fix: IoTDBDeleteAlignedTimeseriesIT & AlignedTVListTest

* fix: Copy globalTimeFilter due to GroupByMonthFilter

* reset tmpLength for backward sort

* * fix TVList clear
* bitmap mark
* sequence row count

* hot-load TVLIST_SORT_THRESHOLD

* fix: isNullValue caller

* fix unit test

* refactor: abstract prepareTvListMapForQuery method

* refactor:  clear/clone/expand indices and bitmap

* merge sort using min heap

* fix: WritableMemChunk deserialize

* feat: add index mem cost for TVList

* fix: hot-load tvlist_sort_threshold setting

* remove needless line in property template

---------

Co-authored-by: shizy <shizy04@gmail.com>
HTHou added a commit that referenced this pull request Feb 25, 2025
* Split non_aligned charge text chunk

* dev non_aligned

* dev aligned chunk split

* new type

* dev aligned binary chunk split

* Fix binary size calculatation

* fix IT

* update IoTDBDuplicateTimeIT.java

* fix pipe IT

* change method names

* add ut

* add UT

* remove useless methods

* fix UT

* fix /FileReaderManagerTest

* fix win UT

* add binary test

* Add Aligned UTs

* fix win ut

* improve coverage

* fix comments

* fix windows UT

* fix review

* fix review

* fix review

* target chunk size count non binary

* fix compile

* fix UT

* Tvlist feat new (#14616)

* null bitmap for int tvlist

* update min/max timestamp and sequential part of tvlist during insert

* mutable & immutable tvlists in writable memchunk

* copy-on-write array list

* review comments part 1

* fix unit test errors

* review comments part 2

* push down global time filter

* fix MemPageReaderTest case

* fix memory page offsets error

* synchronized sort & MergeSortTvListIterator bug

* tvlist_sort_threshold config property

* bug fix:
* out of mempage bounds check
* overlapped data error during query

* optimize TVListIterator & MergeSortTvListIterator

* retrofit encode when tvlist_sort_threshold is zero

* delay sort & statistic generation to query execution

* fix: skip deleted data during encode

* aligned time series part

* fix: MemAlignedChunkReader page offset

* performance issue:
* change some list to array
* remember row count in tvlist iterator

* fix: memory chunk reader may read more points than expected in one page

* update chunk & page statistic for aligend memchunk by column

* revert: getAlignedValueForQuery

* fix: * CopyOnWriteArrayList for AlignedTVList bitmaps
* memory control of column access

* refactor: Tim/Quick/Backward TVList

* refactor: synchronized tvlist method: sort, putXXX

* refactor: change list to array in AlignedTVList iterator

* revert: remove CopyOnWriteArrayList

* refactor: clone MergeSort iterator from ReadOnlyChunk

* fix: clone working tvlist during flush if there is query on it

* fix: writable mem chunk flush conditions

* refactor: add annotation and variable/function rename

* fix: * remove delete method in BinaryTVList
* filter deleted data in WritableMemChunk encode

* fix: remove getSortedTvListForQuery in SeriesRegionScan

* fix: TsFileProcessorTest unit test

* fix: IoTDBNullIdQueryIT.noMeasurementColumnsSelectTest

* fix: delete column of aligned time series

* fix: aligned timeseries encode bug

* fix: IoTDBGroupByNaturalMonthIT

* remove avgSeriesPointNumberThreshold setting

* fix: IoTDBDeleteAlignedTimeseriesIT & AlignedTVListTest

* fix: Copy globalTimeFilter due to GroupByMonthFilter

* reset tmpLength for backward sort

* * fix TVList clear
* bitmap mark
* sequence row count

* hot-load TVLIST_SORT_THRESHOLD

* fix: isNullValue caller

* fix unit test

* refactor: abstract prepareTvListMapForQuery method

* refactor:  clear/clone/expand indices and bitmap

* merge sort using min heap

* fix: WritableMemChunk deserialize

* feat: add index mem cost for TVList

* fix: hot-load tvlist_sort_threshold setting

* remove needless line in property template

---------

Co-authored-by: shizy <shizy04@gmail.com>
@shizy818 shizy818 deleted the tvlist-feat-new branch March 20, 2025 10:38
@shizy818 shizy818 restored the tvlist-feat-new branch March 20, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants